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METHOD AND APPARATUS FOR AUTOMATICALLY UPDATING STOCK AND 
MUTUAL FUND GRAMMARS IN SPEECH RECOGNITION SYSTEMS 



BACKGROUND OF THE INVENTION 

1 . Technical Field 

The present invention relates generally to speech 
recognition systems and, in particular, to a method and 
apparatus for automatically updating stock and mutual fund 
grammars in speech recognition systems. 

2 . Description of Related Art 

Speech recognition technology is becoming more and more 
widely used in financial applications, such as in stock and 
mutual fund trading or information inquiry. In these 
applications, a good grammar on the stock and mutual fund 
names is vital to the performance of the speech recognition 
system. In the past, such grammars were manually generated, 
which required several months of difficult work due to the 
complexity of the task. The manual generation of such 
grammars is complex for a variety of reasons, some of which 
will now be described. One reason the manual generation of 
grammars for financial applications is complex is that most 
stock names published at web sites contain abbreviated words 
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and are, thus, incomplete. Another reason the manual 
generation of grammars for financial applications is complex 
is that the "nick names" of most companies are not readily 
available. Yet another reason the manual generation of 
grammars for financial applications is complex is that some 
statistic parameters must be adjusted to achieve an 
acceptable degree of performance from the speech recognition 
system. Finally, another reason the manual generation of 
grammars for financial applications is complex is that some 
words are pronounced differently depending on the speaker. 

Given that there are tens of thousands of stock and 
mutual fund names in the market and that significant numbers 
of companies are coming into and going out of the market on 
a daily basis, building an efficient and up-to-date stock 
and mutual fund grammar by hand is not only expensive, but 
it is also not feasible. Therefore, there is a need for a 
method and apparatus that automatically generates grammars 
of adequate quality for financial applications in a speech 
recognition system. 

SUMMARY OF THE INVENTION 

The problems stated above, as well as other related 
problems of the prior art, are solved by the present 
invention, a method and apparatus for automatically updating 
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stock and mutual fund grammars in speech recognition 
systems . 

According to an aspect of the present invention, there 
is provided a method for automatically updating stock and 
mutual fund grammars in a speech recognition system. The 
method comprises the step of automatically updating, on a 
pre-specif ied basis, a database having a plurality of 
entries. Each entry respectively corresponds to a publicly 
traded stock or a publicly traded fund, and respectively 
comprises at least one name of the publicly traded stock or 
publicly traded fund, a weight for the at least one name, 
and baseforms of the at least one name. A grammar file for 
names in the database is automatically updated. The grammar 
file includes the names and weights for the names. 

According to another aspect of the present invention, 
the updating step comprises the steps of automatically 
identifying, from web sites, stocks and funds that are no 
longer listed on a market, and automatically removing from 
the database any of the plurality of entries corresponding 
to the identified stocks and funds. 

According to yet another aspect of the present 
invention, the updating step comprises the steps of 
automatically identifying, from web sites, newly listed 
stocks and newly listed funds, if any, and automatically 
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creating an entry in the database for each of the newly 
listed stocks and the newly listed funds. 

According to still another aspect of the invention, the 
updating step comprises the steps of identifying the 
transaction volumes of any stocks and funds for which an 
entry exists in the database, quantizing the transaction 
volumes into a plurality of bands, and assigning a 
corresponding weight to each of the plurality of bands. 

According to still yet another aspect of the invention, 
the method further comprises the step of automatically 
combining short words in the database to form combined 
words. A short word is a stock name or a fund name that has 
less than a predefined number of phonemes. The baseforms 
for the combined words are automatically generating. The 
grammar file is updated to include the combined words. 

According to a further aspect of the invention, the 
step of updating the database comprises the step of 
automatically adapting the weights for the names in the 
database, based upon a transaction volume over a 
predetermined period of time. 

These and other aspects, features and advantages of the 
present invention will become apparent from the following 
detailed description of preferred embodiments, which is to 
be read in connection with the accompanying drawings. 



YOR9-2001-0451US1 (8728-528) - 4 - 



BRIEF DESCRIPTION OF THE DRAWINGS 



FIG. 1 is a block diagram illustrating an apparatus 100 
for automatically updating, on a pre-specif ied basis, stock 
and mutual fund grammars in a speech recognition system, 
according to an illustrative embodiment of the present 
invention; and 

FIG. 2 is a flow diagram illustrating a method for 
automatically updating, on a pre-specif ied basis, stock and 
mutual fund grammars in a speech recognition system, 
according to an illustrative embodiment of the present 
invention . 

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS 

It is to be understood that the present invention may 
be implemented in various forms of hardware, software, 
firmware, special purpose processors, or a combination 
thereof. Preferably, the present invention is implemented 
as a combination of both hardware and software, the software 
being an application program tangibly embodied on a program 
storage device. The application program may be uploaded to, 
and executed by, a machine comprising any suitable 
architecture. Preferably, the machine is implemented on a 
computer platform having hardware such as one or more 
central processing units (CPU) , a random access memory 
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(RAM), and input/output (I/O) interface (s) . The computer 
platform also includes an operating system and 
microinstruction code. The various processes and functions 
described herein may either be part of the microinstruction 
code or part of the application program (or a combination 
thereof) which is executed via the operating system. In 
addition, various other peripheral devices may be connected 
to the computer platform such as an additional data storage 
device . 

It is to be further understood that, because some of 
the constituent system components depicted in the 
accompanying Figures may be implemented in software, the 
actual connections between the system components may differ 
depending upon the manner in which the present invention is 
programmed. Given the teachings herein, one of ordinary 
skill in the related art will be able to contemplate these 
and similar implementations or configurations of the present 
invention . 

FIG. 1 is a block diagram illustrating an apparatus 100 
for automatically updating, on a pre-specif ied basis, stock 
and mutual fund grammars in a speech recognition system, 
according to an illustrative embodiment of the present 
invention. The apparatus 100 includes a database or data 
structure 110 (hereinafter "database"), a web extractor 115, 
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a database update device 12 0, a grammar generator 125, a 
baseform generator 130, and a short word combiner 135. 
While the present invention is described with respect to 
stocks and mutual funds, it is to be appreciated that the 
present invention may be applied to any type of financial 
commodity which is traded on any given financial market. 
Further, while the stock and mutual fund grammars are 
described herein as being updated "on a pre-specif ied 
basis", it is preferable that such updating occur on a daily 
basis. Moreover, while the web extractor 115 is described 
with respect the web, it is to be appreciated that the 
functions of the web extractor 115 may performed with 
respect to any data source or network from which information 
can be extracted for use by the present invention. The 
operation of the elements of apparatus 100 will now be 
described with respect to FIG. 2. 

FIG. 2 is a flow diagram illustrating a method for 
automatically updating, on a pre-specif ied basis, stock and 
mutual fund grammars in a speech recognition system, 
according to an illustrative embodiment of the present 
invention . 

A database 110 is constructed (step 210) , which 
includes the following information for each stock and mutual 
fund symbol: (a) the original name appearing at the web 
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sites; (b) the resolved name which is the name of the fund 
after resolving word abbreviations, removing name 
ambiguities, and so forth; (c) potential nicknames; (d) 
weights for the symbols; and (e) all possible baseforms for 
each word. It is to be appreciated that while the database 
110 is described to include the preceding specified 
information, other information may be used in addition to, 
or in substitution of, the above specified information or a 
portion (s) thereof. Given the teachings of the present 
invention provided herein, one of ordinary skill in the 
related art will readily contemplate other information that 
can be included in database 110 as well as which of the 
above specified information can be substituted or removed 
altogether, if so desired, all the while maintaining the 
spirit and scope of the present invention. 

The rationale for including the above items in the 
database 110 will now be given. The fund names that appear 
at a web site generally include abbreviations. For example, 
a fund name appearing at a web site may be "CT HOLDINGS, 
INC.", where "CT" is an abbreviated form of the word 
"court", which should be resolved. A company may own 
several different stock symbols which might be represented 
by the same name. For example, the symbols "T" , "LMGA" , 
"LMGB", and "AWE", are all represented by "AT&T CORP.", 
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while in fact they represent the following different fund 
names: "AT&T Crop", "AT&T Liberty Media Corp.", "A T & 
T Corp. Class B" , "AT&T Wireless Group", respectively. 
These different fund names should also be resolved. At the 
5 web site of a particular company, generally, only the 

official name of that company is specified, such as 
"INTERNATIONAL BUSINESS MACHINES CORP.". However, in real 
life, people are apt to use nicknames, such as "IBM". Thus, 
it is preferable that all possible nicknames of a company 
10 are added into the stock grammar. In speaker- independent 

O speaker recognition systems, some words have different 

ry pronunciations depending on the speaker. Therefore, it is 

W preferable to list all possible baseforms for each word in 

the vocabulary. This is achieved by listening to numerous 
y 15 live audio data of stock and mutual fund names. In real 

^ life, not all fund names are used with the same probability. 

^ Assigning different probabilities to different stock names 

based on frequency of use could enhance the performance of 
the speech recognition system. 
20 The initial weight for each fund is determined 

according to the following method, represented by steps 
llOa-c in FIG. 2. The transaction volumes of all stocks and 
mutual funds in the database are identified (step 110a) and 
quantized into several different bands (also referred to 
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herein as subsets) (step 110b) . Each of the bands is 
assigned with a value of weight (step 110c) . The number of 
bands to use may be determined arbitrarily and optionally 
modified based on experimental results, or may be based on 
5 pre-specif ied criteria such as, for example, the transaction 

volume. It is to be appreciated that the preceding 
pre-specif ied criteria is merely illustrative and, thus, 
other criteria may be used. The value assigned to each band 
may also be based on pre- identified criteria or may be 

10 arbitrarily selected and then modified based on experimental 

results. The pre-specif ied criteria for assigning a value 
of weight to each band may include, for example the 
transaction volume. It is to be appreciated that while the 
determination of the number of bands and the values of the 

15 weights have been described with respect to the transaction 

volume, other information may be used in conjunction with or 
in place of the transaction volume. Given the teachings of 
the present invention provided herein, one of ordinary skill 
in the related art will contemplate these and various other 

20 criteria for determining how many bands to use, as well as 

the values assigned to each band, while maintaining the 
spirit and scope of the present invention. 

According to one illustrative embodiment of the present 
invention, steps llOb-c above are implemented such that the 
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weight increases by a factor of two with an increase in the 
band number. However, in such a case, there must be some 
restriction on the band number N such that log(N) will not 
exceed the value of the dynamic score range of the searching 
process during speech recognition. Otherwise, the stock 
symbols in the band with the lowest weight will have no 
chance to be recognized, since they may be pruned out of the 
search space. 

According to the preceding illustrative embodiment 
regarding steps llOb-c, the symbols are classified into two 
subsets. The symbols whose transaction volume is larger 
than the average transaction volume for all of the symbols 
in the database are assigned to subset 1; the remaining 
symbols are assigned to subset 2. All symbols in subset 1 
are assigned with the weight value of 1. 

The symbols in subset 2 are classified into two 
subsets. All symbols whose transaction volume is larger than 
the average transaction volume of the symbols in subset 2 
are assigned to the subset 21; the remaining symbols in 
subset 2 are assigned to subset 22. All the symbols in 
subset 21 are assigned with the weight value of 0.5. 

Similar to the preceding step, all symbols in subset 22 
are classified into two subsets 221 and 222. All symbols in 
the subset 221 are assigned with the weight value of 0.25. 
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All symbols in subset 222 are classified into two 
subsets 2221 and 2222, and so forth, until 14 subsets are 
obtained, with the weight of the 1st subset to be 1, the 
second subset to be 0.5, the third subset to be 0.25, 
the 14th set to be 1/(2**13) = 1/8192 = 0,000122, As noted 
above, this is but one illustrative implementation for 
determining the number of bands and the values of weights 
and, thus, other methodologies for accomplishing the same 
may be employed while maintaining the spirit and scope of 
the present invention. 

It is to be appreciated that the construction of the 
database at step 110 may be performed using, at the least, 
the database update device 120 and the web extractor 115. 
The web extractor 115 could initially extract the stock and 
mutual fund names from web sites (as well as any nicknames, 
transaction volumes, and so forth) , and the database update 
device 12 0 could resolve the extracted names, calculate the 
initial weights, and so forth. Of course, other 
arrangements are possible, including receiving and using a 
database which has already been constructed. Such a 
pre-constructed database could have an expiration date 
associated therewith, given the potential volume of changes 
that could occur in such a database over a very short period 
of time (e.g., new stocks and funds being included in the 
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market and other stocks and funds being removed/ delisted 
from the market) . 

Stock names and mutual fund names, as well as 
information corresponding thereto (e.g., nick names, 
transaction volumes, and so forth) , are extracted from a set 
of stock exchange web sites (step 220) , by the web extractor 
115. Step 220 includes the step of identifying any stock 
names and mutual fund names that are no longer valid (i.e., 
the stocks and mutual funds that are no longer in the market 
(no longer traded/listed) ) (step 220a) , as well as new 
(e.g., newly listed) stocks and mutual funds (step 220b). 
In the illustrative embodiment of the present invention, the 
following seven stock exchange web sites are used: American 
Exchange; Canadian Dealer's Network Exchange; Montreal Stock 
Exchange; NASDAQ; New York Stock Exchange; OTC Bulletin 
Board; and Toronto Stock Exchange. Of course, other stock 
exchanges can be used, while maintaining the spirit and 
scope of the present invention. 

The database 110 is automatically updated (step 230) by 
the database update device 12 0, based upon a result of step 
220. Step 230 may include deleting one or more existing 
entries (step 230a) and/or creating one or more new entries 
(step 230b) . For, example at step 230, entries 
corresponding to stocks and/or mutual funds that are no 
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longer traded are removed from the database 110 (step 230a) 
and entries corresponding to new stocks and funds are added 
to the database (step 230b) . Moreover, step 230 includes 
the step of adapting the weight for each stock symbol based 
on the transaction volume of the corresponding stock or fund 
over a predefined time period (e.g., last two weeks) (step 
23 0c) . Such adaptation is performed by the database update 
device 120. At step 230, it is preferable that a user 
manually check the new fund names, and appropriate 
nicknames, if possible. 

A grammar file is automatically constructed from the 
database (step 240), by the grammar generator 125. The 
grammar file includes a plurality of entries, with each 
entry corresponding to a stock or mutual fund. In 
particular, each entry includes, for a given symbol 
representing a stock or mutual fund, a weight for the symbol 
and different names for the stock or mutual fund with 
optional words. 

An example of two entries in the grammar file is as 
follows : 

+0.010129856039 AMERICAN ANNUITY GROUP CAPITAL TRUST : 
NYSE_AAGPRT 

+0.270129494365 AAMES FINANCIAL CORP : NYSE AAM 
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It is to be appreciated that the above configuration of 
the grammar file is for illustrative purposes and, thus, 
other configurations of the grammar file may be employed, 
while maintaining the spirit and scope of the present 
invention. 

Baseforms of the new words are automatically generated 
from the grammar file (step 250) , by the baseform generator 
130. Preferably, the baseforms generated by the baseform 
generator 130 at step 250 are manually checked by a user. 
In the context of step 250, the phrase "new words" refers to 
those words for which baseforms have not yet been created. 
An example of a baseform file is as follows: 

AAMES AA M Z 

AMERICAN AX M EH R IX K AX N 

ANNUITY AX N y UW IX T lY 

CORP K AO R P AXR EY SH AX N 

FINANCIAL F AY N AE N SH AX L 

GROUP G R UW PD 

CAPITAL K AE P IX T AX L 

TRUST T R AH S TD 

Short words (i.e., words having less than a predefined 
number of phonemes) are automatically combined by the short 
word combiner 135 to form combined words (step 260) . The 
weights for the combined words are then automatically 
generated by the database update device 12 0 (although the 
combined words need not, and in the preferred embodiment are 
not, included in the database) (step 265) . Moreover, all 
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possible baseforms of the combined words are then 
automatically generated by the baseform generator 130 (step 
270) . The short words are combined by the short word 
combiner 135 to improve the performance of the speech 
5 recognition system. It is to be appreciated that short words 

are combined until the number of phonemes of a combined word 
is equal to or greater than the predefined number of 
phonemes. As an example, the predefined number of phonemes 
may be set to six phonemes. Thus, given the two leading 

10 words "AMERICAN" and "AAMES" in the baseform file above, the 
first word has eight phonemes which is regarded as not a 
short word. Accordingly, the first word will not be 
combined with the next (second) word. However, the second 
word has only three phonemes which is regarded as a short 

15 word. Accordingly, the second word is combined with the 

next (third) word as follows: AAMES_FINANCIAL . 

An example of the baseform file which includes a 
combined word is as follows: 

20 AAMES AA M Z 

AAMES_FINANCIAL AA M Z F AY N AE N SH AX L 

AMERICAN AX M EH R IX K AX N 

ANNUITY AX N Y UW IX T lY 

CORP K AO R P AXR EY SH AX N 

25 FINANCIAL F AY N AE N SH AX L 

GROUP G R UW PD 

CAPITAL K AE P IX T AX L 

TRUST T R AH S TD 
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The final grammar file is then generated to include the 
combined words (step 280), by the grammar generator 125. 

Thus, with respect to the two entries in the grammar 
file above, the portion of the final grammar file 
corresponding thereto is as follows: 

+0.010129856039 AMERICAN ANNUITY GROUP CAPITAL TRUST : 
NYSE_AAGPRT 

+ 0.270129494365 AAMES FINANCIAL CORP : NYSE_AAiyi 
+0.270129494365 AAMES_FINANCIAL CORP : NYSE_AAM 

Although the illustrative embodiments have been 
described herein with reference to the accompanying 
drawings, it is to be understood that the present system and 
method is not limited to those precise embodiments, and that 
various other changes and modifications may be affected 
therein by one skilled in the art without departing from the 
scope or spirit of the invention. All such changes and 
modifications are intended to be included within the scope 
of the invention as defined by the appended claims. 
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